home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.cs.arizona.edu
/
ftp.cs.arizona.edu.tar
/
ftp.cs.arizona.edu
/
icon
/
newsgrp
/
group95b.txt
/
000028_icon-group-sender _Sun May 28 21:23:08 1995.msg
< prev
next >
Wrap
Internet Message Format
|
1995-09-18
|
2KB
Received: by cheltenham.cs.arizona.edu; Mon, 29 May 1995 13:25:48 MST
From: Phil Bralich <bralich@uhunix.uhcc.Hawaii.Edu>
To: icon-group@cs.arizona.edu
Subject: Machine Usable Dictionary
Content-Length: 1694
Message-Id: <95May28.212314hst.97208@uhunix2.uhcc.Hawaii.Edu>
Date: Sun, 28 May 1995 21:23:08 -1000
Errors-To: icon-group-errors@cs.arizona.edu
As you may know from postings I have made to this list over the last
couple of months, Derek Bickerton and I are developing a parser
based on a theory of syntax that he and I have been developing over
the last four years. We are about to purchase a machine usable
dictionary with approximately 70,000 entries for $2500. If anyone
could advise us whether or not that is our best bet, or where we might
find other dictionaries, we would appreciate hearing from you.
We are currently working with a dictionary of under 1000 words, so it
is imperative that we obtain a larger one, so we may begin working
with larger corpora. Toward that end we would also like to find out
which texts were used in past parsing competitions and where the
results of these competitions are published. We believe that with a
few weeks of work we should be able to modify a dictionary
sufficiently to allow us to begin experinmenting with texts that were
used in past parsing competitions.
Here are the specs the parser. It is based on a series of algorithms that
have been four years in the making, but the programming required to
create this parser has only taken 300 hours using C++ . There
areapproximately 3000 lines of code that take up 150k executable on
disk. About 100k of RAM is required to run the parser. 30k on disk is
required for a 300 word dictionary. An average sentence takes under
4 seconds to process on a 486 IBM compatible. Since this is only a
development version, we expect these numbers to change. To date, no
optimizations have occurred, and we expect to significantly shrink the
dictionary disk usage and the execution time.
Phil Bralich
bralich@uhccux.uhcc.Hawaii.edu